A/B Testing 


Research Design 
& 
Statistical Power Analysis 





A/B Test is a controlled experiment 
e all elements except one are held constant 
o variants differ only by the variable being tested 
e split tests, CRO, MVT, growth hacking 


e get info from actual users E 


o In real time, unbiased 
o not ‘focus’ groups or surveys 





e rnany types of variant tested 
o color, size, button, algorithms, flow 
o UX, ads, apps, site pages, email 


Control Group Experimental Group 
Can you tell which is which? :) 


Mobile A/B Testing 


e e Variation A Variation B 
A/B Test is an experiment ep ` Ce 
e Make data driven decisions (DDDM) "nn 
= * 
e Generate repeatable results SS eer 
e Scientifically establish causality | | 
o change (in metric) to change in product ———7: e 


e Understand, uncover effect of changes quickly 


e Ultimately, increase revenue 


1. Prerequisites 


2. Experimental Design Dn — 
3. Conduct Experiment 
o o oa oe o 


Define goal Identify metric Develop Hypothesis Set up Run naue 
Experi Results 


4. Analysis of Results GC o Q9 "e UA 
5. Decision/Recommendation 


6. Document your learning 


A hypothesis should always: 
e Define key metrics 
o Revenue, user, engagement 
o Stakeholders, desired outcome? 
o Develop appropriate hypothesis 


- explain what you expect to happen 
- be clear and understandable 
- be testible 


- be measurable 


- contain an independent and 


e Establish the change process dependent variable 
o Easy to implement, resource sensitive 
o Clear link to key metric(s) 
o Product to represent variant-desired info 





e Determine the randomization unit 
o Who or what is randomly allocated 
o Session, users or page views, daily or weekly, 


1.1 Choosing the Right Metrics 


e Goal/Success Metric 
o Company's mission/vision, KPCs, OKRs 
o Not easy to measure 
o Long-term- Increase overall brand awareness 


goals that aim to increase profit by a specific 
percentage in a specific time period 


e Driver Metric Goal 
o Indirect -influence the actions of others “ao YS 
o Predictive -forecast outcomes, costs, or effects Goal 
o aligned with goal metric, but more actionable eae 
o Short-term -increase sales by 10 % this quarter | 


Goal 


goals are focused on expanding the 
company 


@webcoredigital 





e Attributes of a good metric 
. . Vanity Actionable 
o Simple -easy to define, understand Metrics Metrics 


O 


Actionable -easy to implement, suitable CS 
ed 


o Clear - easy to interpret, clear insight 


Feel good to look at but lack Can be used to inform better 


O M eas u ra b | e - d | rect ly f ro M t he d ata guidance for next steps business descions. 





o Attributable - to the experiment variant 


O 


Timely - detectable during experiment duration 





e Population 


o age, location, time zone, experimental control 
specimen/samples specimen/samples 


e 
e Sample Size 


o Statistical power = sample size 


o Acceptable practical significance evidence evidence 


e Duration of the experimen 


o Seasonality, cadence 
o Primacy & novelty effect 





e Minimum meaningful effect 
o Domain expertise 
o Previous research 


e Statistical vs meaningful significance 


Confidence Standard 
Level Error 


o p-value & confidence interval 


e Depends on the statistical test 
o Type Il error or power (beta = O.2) 
o Correctly reject the Ho 


e Level of sensitivity 


o Minimum detectable effect 


e More samples =smaller error, more power 








e Effect Size - magnitude of result, i y 
o tells the magnitude of the difference 
o many was to measure 
— Pearson correlation, cohen’s d, OR 
e Sample Size 


O #of observations 


O size of experimental data \/ 
e Statistical Power 
o probability of accepting Ha, if true | 












Test type Size effect formula Effect size class intervals 














Low effect size Medium effect size High effect size 





Standardized effect size estimate Unstandardized effect size estimate 
A scale-free effect size estimate Measure expressed in the original outcome 
scale or in terms of percentages/proportions 





d family r family other Mean difference 
Variations of Correlations, Median difference 
“Standardized mean “variance accounted e — odds ratio Difference in 
difference” for” e Joe odds percentage or 


Chi-square z a d 5 i ratio proportions 
y > Point- : 
test Hedges' g S e 1 Ratio of mean 
Giens d iserial valies 


Anova KE j Glass’ A correlation Other 






Correlation 
analysis 












Effect Size Indicator Definition 

















Multiple linear Product Moment Pearson r GER IN 
regression" . Lem i 
Formula given Correct formula without Correct formula iin (n rk r/ VI-r 
by author standardized difference with standardized g unctionsofr Z; % log, l+r ] 
(effect size) difference (effect l-r 
size) Cohen's q Zen — 24," 
o*)x[Z,+Z,] ` (ell Zu +Z] 2x[Z,, +Z] 
a B a2 B a2 B H - 
(ay (dy (d Standardized — ci: Qc. er ae 
where d= i, AS where d - (p, - p) where del, tio Ree Glass's A (M, — M;)/S control group 
ween e 
-— Hedges's g (M, — M;)/S pooled 
2x[z +Z] | 2xpx(t-P)x[Z, *Z,T Zarz] 
(ay (a) Differences Cohen's g P—.50 
p-p where p= pie) = = Between d P, em P; 
p des : ` 
where d= (p,- p Proportions Cohen's h P- P? 





a. This is an effect size indexing the magnitude of the difference between two effect sizes. 
b. P's are first transformed to angles measured in radians: 2 arcsin VP. 


e Revenue vs engagement 
e Experimental vs implementation cost 


e Practical vs meaningful significance 
o cost of implementation vs effect size 


e Sanity checks 
O Final size, randomization, consistency, 
O Test of assumptions, failures, etc. 











e Power (1-8) where B is the 
probability of wrongly concluding there 
is no effect when one actually exists 


e as effect sizes increases so does 
the power 


e however increasing the # of 
observation may bring diminishing 
returns 


Power of Test 





Number of Observations 













E 2 f( (1 2- 1/ m)/p 
1 — p, exp(— In(2)F /m) 
exp(— In(2)A/m) 
wee In(2)A/m 
n = (C+ E)/2 


B(8) = Pr (T, > 1.64 | up = 8) 
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https://machinelearningmastery.com/statistical-power-and-power-analysis-in-python/ 


https://towardsdatascience.com/how-to-use-python-to-figure-out-sample-sizes-for-your-study-87 110a76a19c 


https://machinelearningmastery.com/statistical-power-and-power-analysis-in-python/ 


https://carlosgrande.me/sample-size-determination/ 


https://www.geeksforgeeks.org/introduction-to-power-analysis-in-python/ 





Settings 
Solve for? Power 


Power (1-B = 0.8) 


EE, 


Significance level (a = 0.05) 


Statistical Power Effect size (d = 0.67) 
& 


One-tailed  Two-tailed Reset zoom 


Significance Testing Deeg. 


Zem 


an interactive visualization 
httos://rosychologist.com/d3/nhst/ 





5% 20 76 17.48 


Typelerror Type II error Sample size 





